Overview

Dataset statistics

 Dataset ADataset B
Number of variables1212
Number of observations446446
Missing cells432439
Missing cells (%)8.1%8.2%
Duplicate rows00
Duplicate rows (%)0.0%0.0%
Total size in memory45.3 KiB45.3 KiB
Average record size in memory104.0 B104.0 B

Variable types

 Dataset ADataset B
Numeric55
Categorical44
Text33

Alerts

Dataset ADataset B
Age has 81 (18.2%) missing values Age has 101 (22.6%) missing values Missing
Cabin has 351 (78.7%) missing values Cabin has 337 (75.6%) missing values Missing
PassengerId has unique values PassengerId has unique values Unique
Name has unique values Name has unique values Unique
SibSp has 306 (68.6%) zeros SibSp has 308 (69.1%) zeros Zeros
Parch has 336 (75.3%) zeros Parch has 338 (75.8%) zeros Zeros
Fare has 5 (1.1%) zeros Fare has 9 (2.0%) zeros Zeros

Reproduction

 Dataset ADataset B
Analysis started2024-05-07 19:22:23.9039152024-05-07 19:22:27.789959
Analysis finished2024-05-07 19:22:27.7888102024-05-07 19:22:31.810183
Duration3.88 seconds4.02 seconds
Software versionydata-profiling v0.0.dev0ydata-profiling v0.0.dev0
Download configurationconfig.jsonconfig.json

Variables

PassengerId
Real number (ℝ)

 Dataset ADataset B
Distinct446446
Distinct (%)100.0%100.0%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean438.5583438.30269
 Dataset ADataset B
Minimum22
Maximum889891
Zeros00
Zeros (%)0.0%0.0%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2024-05-07T19:22:31.991491image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum22
5-th percentile44.538.5
Q1220.5202.25
median435.5443.5
Q3653.75671.5
95-th percentile848.75840.25
Maximum889891
Range887889
Interquartile range (IQR)433.25469.25

Descriptive statistics

 Dataset ADataset B
Standard deviation256.34908261.9855
Coefficient of variation (CV)0.58452680.59772733
Kurtosis-1.1315409-1.2588459
Mean438.5583438.30269
Median Absolute Deviation (MAD)218236
Skewness0.0556810340.0070747928
Sum195597195483
Variance65714.84968636.4
MonotonicityNot monotonicNot monotonic
2024-05-07T19:22:32.389799image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
702 1
 
0.2%
499 1
 
0.2%
176 1
 
0.2%
32 1
 
0.2%
433 1
 
0.2%
733 1
 
0.2%
785 1
 
0.2%
412 1
 
0.2%
449 1
 
0.2%
256 1
 
0.2%
Other values (436) 436
97.8%
ValueCountFrequency (%)
690 1
 
0.2%
655 1
 
0.2%
2 1
 
0.2%
610 1
 
0.2%
745 1
 
0.2%
228 1
 
0.2%
846 1
 
0.2%
841 1
 
0.2%
619 1
 
0.2%
90 1
 
0.2%
Other values (436) 436
97.8%
ValueCountFrequency (%)
2 1
0.2%
3 1
0.2%
4 1
0.2%
5 1
0.2%
6 1
0.2%
7 1
0.2%
8 1
0.2%
9 1
0.2%
10 1
0.2%
11 1
0.2%
ValueCountFrequency (%)
2 1
0.2%
6 1
0.2%
8 1
0.2%
9 1
0.2%
10 1
0.2%
11 1
0.2%
14 1
0.2%
20 1
0.2%
21 1
0.2%
22 1
0.2%
ValueCountFrequency (%)
2 1
0.2%
6 1
0.2%
8 1
0.2%
9 1
0.2%
10 1
0.2%
11 1
0.2%
14 1
0.2%
20 1
0.2%
21 1
0.2%
22 1
0.2%
ValueCountFrequency (%)
2 1
0.2%
3 1
0.2%
4 1
0.2%
5 1
0.2%
6 1
0.2%
7 1
0.2%
8 1
0.2%
9 1
0.2%
10 1
0.2%
11 1
0.2%

Survived
Categorical

 Dataset ADataset B
Distinct22
Distinct (%)0.4%0.4%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
0
273 
1
173 
0
259 
1
187 

Length

 Dataset ADataset B
Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters446446
Distinct characters22
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st row11
2nd row00
3rd row11
4th row10
5th row01

Common Values

ValueCountFrequency (%)
0 273
61.2%
1 173
38.8%
ValueCountFrequency (%)
0 259
58.1%
1 187
41.9%

Length

2024-05-07T19:22:32.594680image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2024-05-07T19:22:32.741314image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T19:22:32.877958image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
ValueCountFrequency (%)
0 273
61.2%
1 173
38.8%
ValueCountFrequency (%)
0 259
58.1%
1 187
41.9%

Most occurring characters

ValueCountFrequency (%)
0 273
61.2%
1 173
38.8%
ValueCountFrequency (%)
0 259
58.1%
1 187
41.9%

Most occurring categories

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
0 273
61.2%
1 173
38.8%
ValueCountFrequency (%)
0 259
58.1%
1 187
41.9%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
0 273
61.2%
1 173
38.8%
ValueCountFrequency (%)
0 259
58.1%
1 187
41.9%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
0 273
61.2%
1 173
38.8%
ValueCountFrequency (%)
0 259
58.1%
1 187
41.9%

Pclass
Categorical

 Dataset ADataset B
Distinct33
Distinct (%)0.7%0.7%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
3
248 
1
101 
2
97 
3
249 
1
113 
2
84 

Length

 Dataset ADataset B
Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters446446
Distinct characters33
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st row11
2nd row33
3rd row23
4th row13
5th row33

Common Values

ValueCountFrequency (%)
3 248
55.6%
1 101
22.6%
2 97
 
21.7%
ValueCountFrequency (%)
3 249
55.8%
1 113
25.3%
2 84
 
18.8%

Length

2024-05-07T19:22:33.023456image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2024-05-07T19:22:33.170128image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T19:22:33.319299image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
ValueCountFrequency (%)
3 248
55.6%
1 101
22.6%
2 97
 
21.7%
ValueCountFrequency (%)
3 249
55.8%
1 113
25.3%
2 84
 
18.8%

Most occurring characters

ValueCountFrequency (%)
3 248
55.6%
1 101
22.6%
2 97
 
21.7%
ValueCountFrequency (%)
3 249
55.8%
1 113
25.3%
2 84
 
18.8%

Most occurring categories

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
3 248
55.6%
1 101
22.6%
2 97
 
21.7%
ValueCountFrequency (%)
3 249
55.8%
1 113
25.3%
2 84
 
18.8%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
3 248
55.6%
1 101
22.6%
2 97
 
21.7%
ValueCountFrequency (%)
3 249
55.8%
1 113
25.3%
2 84
 
18.8%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 446
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
3 248
55.6%
1 101
22.6%
2 97
 
21.7%
ValueCountFrequency (%)
3 249
55.8%
1 113
25.3%
2 84
 
18.8%

Name
['Text', 'Text']

 Dataset ADataset B
Distinct446446
Distinct (%)100.0%100.0%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2024-05-07T19:22:33.814732image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Length

 Dataset ADataset B
Max length8282
Median length5049
Mean length27.48430527.004484
Min length1212

Characters and Unicode

 Dataset ADataset B
Total characters1225812044
Distinct characters5960
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique446446 ?
Unique (%)100.0%100.0%

Sample

 Dataset ADataset B
1st rowSilverthorne, Mr. Spencer VictorMadill, Miss. Georgette Alexandra
2nd rowAndersson, Miss. Sigrid ElisabethPalsson, Master. Gosta Leonard
3rd rowLehmann, Miss. BerthaCoutts, Master. William Loch "William"
4th rowPotter, Mrs. Thomas Jr (Lily Alexenia Wilson)Ivanoff, Mr. Kanio
5th rowVander Planke, Mrs. Julius (Emelia Maria Vandemoortele)Lulic, Mr. Nikola
ValueCountFrequency (%)
mr 250
 
13.5%
miss 99
 
5.4%
mrs 65
 
3.5%
william 30
 
1.6%
john 24
 
1.3%
master 20
 
1.1%
henry 19
 
1.0%
mary 13
 
0.7%
james 13
 
0.7%
charles 11
 
0.6%
Other values (900) 1303
70.5%
ValueCountFrequency (%)
mr 262
 
14.3%
miss 92
 
5.0%
mrs 67
 
3.7%
william 29
 
1.6%
john 28
 
1.5%
master 19
 
1.0%
james 18
 
1.0%
henry 15
 
0.8%
george 14
 
0.8%
johan 10
 
0.5%
Other values (884) 1273
69.7%
2024-05-07T19:22:34.679202image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
1402
 
11.4%
r 971
 
7.9%
e 854
 
7.0%
a 824
 
6.7%
s 687
 
5.6%
n 682
 
5.6%
i 674
 
5.5%
l 568
 
4.6%
M 555
 
4.5%
o 516
 
4.2%
Other values (49) 4525
36.9%
ValueCountFrequency (%)
1381
 
11.5%
r 966
 
8.0%
e 862
 
7.2%
a 834
 
6.9%
n 676
 
5.6%
i 629
 
5.2%
s 629
 
5.2%
M 567
 
4.7%
o 526
 
4.4%
l 515
 
4.3%
Other values (50) 4459
37.0%

Most occurring categories

ValueCountFrequency (%)
(unknown) 12258
100.0%
ValueCountFrequency (%)
(unknown) 12044
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
1402
 
11.4%
r 971
 
7.9%
e 854
 
7.0%
a 824
 
6.7%
s 687
 
5.6%
n 682
 
5.6%
i 674
 
5.5%
l 568
 
4.6%
M 555
 
4.5%
o 516
 
4.2%
Other values (49) 4525
36.9%
ValueCountFrequency (%)
1381
 
11.5%
r 966
 
8.0%
e 862
 
7.2%
a 834
 
6.9%
n 676
 
5.6%
i 629
 
5.2%
s 629
 
5.2%
M 567
 
4.7%
o 526
 
4.4%
l 515
 
4.3%
Other values (50) 4459
37.0%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 12258
100.0%
ValueCountFrequency (%)
(unknown) 12044
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
1402
 
11.4%
r 971
 
7.9%
e 854
 
7.0%
a 824
 
6.7%
s 687
 
5.6%
n 682
 
5.6%
i 674
 
5.5%
l 568
 
4.6%
M 555
 
4.5%
o 516
 
4.2%
Other values (49) 4525
36.9%
ValueCountFrequency (%)
1381
 
11.5%
r 966
 
8.0%
e 862
 
7.2%
a 834
 
6.9%
n 676
 
5.6%
i 629
 
5.2%
s 629
 
5.2%
M 567
 
4.7%
o 526
 
4.4%
l 515
 
4.3%
Other values (50) 4459
37.0%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 12258
100.0%
ValueCountFrequency (%)
(unknown) 12044
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
1402
 
11.4%
r 971
 
7.9%
e 854
 
7.0%
a 824
 
6.7%
s 687
 
5.6%
n 682
 
5.6%
i 674
 
5.5%
l 568
 
4.6%
M 555
 
4.5%
o 516
 
4.2%
Other values (49) 4525
36.9%
ValueCountFrequency (%)
1381
 
11.5%
r 966
 
8.0%
e 862
 
7.2%
a 834
 
6.9%
n 676
 
5.6%
i 629
 
5.2%
s 629
 
5.2%
M 567
 
4.7%
o 526
 
4.4%
l 515
 
4.3%
Other values (50) 4459
37.0%

Sex
Categorical

 Dataset ADataset B
Distinct22
Distinct (%)0.4%0.4%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
male
280 
female
166 
male
286 
female
160 

Length

 Dataset ADataset B
Max length66
Median length44
Mean length4.74439464.7174888
Min length44

Characters and Unicode

 Dataset ADataset B
Total characters21162104
Distinct characters55
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st rowmalefemale
2nd rowfemalemale
3rd rowfemalemale
4th rowfemalemale
5th rowfemalemale

Common Values

ValueCountFrequency (%)
male 280
62.8%
female 166
37.2%
ValueCountFrequency (%)
male 286
64.1%
female 160
35.9%

Length

2024-05-07T19:22:34.927784image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2024-05-07T19:22:35.092636image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T19:22:35.233462image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
ValueCountFrequency (%)
male 280
62.8%
female 166
37.2%
ValueCountFrequency (%)
male 286
64.1%
female 160
35.9%

Most occurring characters

ValueCountFrequency (%)
e 612
28.9%
m 446
21.1%
a 446
21.1%
l 446
21.1%
f 166
 
7.8%
ValueCountFrequency (%)
e 606
28.8%
m 446
21.2%
a 446
21.2%
l 446
21.2%
f 160
 
7.6%

Most occurring categories

ValueCountFrequency (%)
(unknown) 2116
100.0%
ValueCountFrequency (%)
(unknown) 2104
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
e 612
28.9%
m 446
21.1%
a 446
21.1%
l 446
21.1%
f 166
 
7.8%
ValueCountFrequency (%)
e 606
28.8%
m 446
21.2%
a 446
21.2%
l 446
21.2%
f 160
 
7.6%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 2116
100.0%
ValueCountFrequency (%)
(unknown) 2104
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
e 612
28.9%
m 446
21.1%
a 446
21.1%
l 446
21.1%
f 166
 
7.8%
ValueCountFrequency (%)
e 606
28.8%
m 446
21.2%
a 446
21.2%
l 446
21.2%
f 160
 
7.6%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 2116
100.0%
ValueCountFrequency (%)
(unknown) 2104
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
e 612
28.9%
m 446
21.1%
a 446
21.1%
l 446
21.1%
f 166
 
7.8%
ValueCountFrequency (%)
e 606
28.8%
m 446
21.2%
a 446
21.2%
l 446
21.2%
f 160
 
7.6%

Age
Real number (ℝ)

 Dataset ADataset B
Distinct7773
Distinct (%)21.1%21.2%
Missing81101
Missing (%)18.2%22.6%
Infinite00
Infinite (%)0.0%0.0%
Mean29.44841128.481652
 Dataset ADataset B
Minimum0.420.42
Maximum7474
Zeros00
Zeros (%)0.0%0.0%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2024-05-07T19:22:35.449127image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum0.420.42
5-th percentile44
Q11920
median28.527
Q33936
95-th percentile5554
Maximum7474
Range73.5873.58
Interquartile range (IQR)2016

Descriptive statistics

 Dataset ADataset B
Standard deviation14.52390813.978708
Coefficient of variation (CV)0.493198370.49079695
Kurtosis-0.0520229940.47179103
Mean29.44841128.481652
Median Absolute Deviation (MAD)9.58
Skewness0.279159250.44265153
Sum10748.679826.17
Variance210.94392195.40427
MonotonicityNot monotonicNot monotonic
2024-05-07T19:22:35.729527image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
18 15
 
3.4%
25 14
 
3.1%
28 14
 
3.1%
21 14
 
3.1%
19 13
 
2.9%
36 12
 
2.7%
16 12
 
2.7%
35 12
 
2.7%
39 12
 
2.7%
30 12
 
2.7%
Other values (67) 235
52.7%
(Missing) 81
 
18.2%
ValueCountFrequency (%)
24 18
 
4.0%
21 15
 
3.4%
27 13
 
2.9%
22 13
 
2.9%
18 13
 
2.9%
16 12
 
2.7%
19 12
 
2.7%
36 12
 
2.7%
30 11
 
2.5%
26 11
 
2.5%
Other values (63) 215
48.2%
(Missing) 101
22.6%
ValueCountFrequency (%)
0.42 1
 
0.2%
0.75 2
 
0.4%
0.83 1
 
0.2%
0.92 1
 
0.2%
1 1
 
0.2%
2 5
1.1%
3 4
0.9%
4 7
1.6%
5 4
0.9%
6 1
 
0.2%
ValueCountFrequency (%)
0.42 1
 
0.2%
0.67 1
 
0.2%
0.75 1
 
0.2%
0.83 1
 
0.2%
1 3
0.7%
2 5
1.1%
3 3
0.7%
4 6
1.3%
5 4
0.9%
6 2
 
0.4%
ValueCountFrequency (%)
0.42 1
 
0.2%
0.67 1
 
0.2%
0.75 1
 
0.2%
0.83 1
 
0.2%
1 3
0.7%
2 5
1.1%
3 3
0.7%
4 6
1.3%
5 4
0.9%
6 2
 
0.4%
ValueCountFrequency (%)
0.42 1
 
0.2%
0.75 2
 
0.4%
0.83 1
 
0.2%
0.92 1
 
0.2%
1 1
 
0.2%
2 5
1.1%
3 4
0.9%
4 7
1.6%
5 4
0.9%
6 1
 
0.2%

SibSp
Real number (ℝ)

 Dataset ADataset B
Distinct67
Distinct (%)1.3%1.6%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean0.45739910.47757848
 Dataset ADataset B
Minimum00
Maximum58
Zeros306308
Zeros (%)68.6%69.1%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2024-05-07T19:22:35.934947image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum00
5-th percentile00
Q100
median00
Q311
95-th percentile22
Maximum58
Range58
Interquartile range (IQR)11

Descriptive statistics

 Dataset ADataset B
Standard deviation0.85419071.0507095
Coefficient of variation (CV)1.86749542.2000772
Kurtosis7.275873724.33517
Mean0.45739910.47757848
Median Absolute Deviation (MAD)00
Skewness2.5344834.2861733
Sum204213
Variance0.729641761.1039905
MonotonicityNot monotonicNot monotonic
2024-05-07T19:22:36.101815image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
0 306
68.6%
1 106
 
23.8%
2 16
 
3.6%
4 10
 
2.2%
3 7
 
1.6%
5 1
 
0.2%
ValueCountFrequency (%)
0 308
69.1%
1 108
 
24.2%
2 13
 
2.9%
3 7
 
1.6%
8 4
 
0.9%
4 4
 
0.9%
5 2
 
0.4%
ValueCountFrequency (%)
0 306
68.6%
1 106
 
23.8%
2 16
 
3.6%
3 7
 
1.6%
4 10
 
2.2%
5 1
 
0.2%
ValueCountFrequency (%)
0 308
69.1%
1 108
 
24.2%
2 13
 
2.9%
3 7
 
1.6%
4 4
 
0.9%
5 2
 
0.4%
8 4
 
0.9%
ValueCountFrequency (%)
0 308
69.1%
1 108
 
24.2%
2 13
 
2.9%
3 7
 
1.6%
4 4
 
0.9%
5 2
 
0.4%
8 4
 
0.9%
ValueCountFrequency (%)
0 306
68.6%
1 106
 
23.8%
2 16
 
3.6%
3 7
 
1.6%
4 10
 
2.2%
5 1
 
0.2%

Parch
Real number (ℝ)

 Dataset ADataset B
Distinct77
Distinct (%)1.6%1.6%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean0.408071750.37892377
 Dataset ADataset B
Minimum00
Maximum66
Zeros336338
Zeros (%)75.3%75.8%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2024-05-07T19:22:36.263174image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum00
5-th percentile00
Q100
median00
Q300
95-th percentile22
Maximum66
Range66
Interquartile range (IQR)00

Descriptive statistics

 Dataset ADataset B
Standard deviation0.868591710.81414722
Coefficient of variation (CV)2.12852692.1485779
Kurtosis10.82286912.212103
Mean0.408071750.37892377
Median Absolute Deviation (MAD)00
Skewness2.90964573.0177092
Sum182169
Variance0.754451550.66283569
MonotonicityNot monotonicNot monotonic
2024-05-07T19:22:36.423846image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
0 336
75.3%
1 60
 
13.5%
2 41
 
9.2%
5 4
 
0.9%
3 2
 
0.4%
4 2
 
0.4%
6 1
 
0.2%
ValueCountFrequency (%)
0 338
75.8%
1 65
 
14.6%
2 35
 
7.8%
5 3
 
0.7%
3 3
 
0.7%
4 1
 
0.2%
6 1
 
0.2%
ValueCountFrequency (%)
0 336
75.3%
1 60
 
13.5%
2 41
 
9.2%
3 2
 
0.4%
4 2
 
0.4%
5 4
 
0.9%
6 1
 
0.2%
ValueCountFrequency (%)
0 338
75.8%
1 65
 
14.6%
2 35
 
7.8%
3 3
 
0.7%
4 1
 
0.2%
5 3
 
0.7%
6 1
 
0.2%
ValueCountFrequency (%)
0 338
75.8%
1 65
 
14.6%
2 35
 
7.8%
3 3
 
0.7%
4 1
 
0.2%
5 3
 
0.7%
6 1
 
0.2%
ValueCountFrequency (%)
0 336
75.3%
1 60
 
13.5%
2 41
 
9.2%
3 2
 
0.4%
4 2
 
0.4%
5 4
 
0.9%
6 1
 
0.2%

Ticket
['Text', 'Text']

 Dataset ADataset B
Distinct387374
Distinct (%)86.8%83.9%
Missing00
Missing (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2024-05-07T19:22:37.098159image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Length

 Dataset ADataset B
Max length1818
Median length1717
Mean length6.66816146.7286996
Min length34

Characters and Unicode

 Dataset ADataset B
Total characters29743001
Distinct characters3232
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique345316 ?
Unique (%)77.4%70.9%

Sample

 Dataset ADataset B
1st rowPC 1747524160
2nd row347082349909
3rd rowSC 1748C.A. 37671
4th row11767349201
5th row345763315098
ValueCountFrequency (%)
pc 27
 
4.9%
c.a 15
 
2.7%
a/5 11
 
2.0%
ston/o 6
 
1.1%
2 6
 
1.1%
347088 5
 
0.9%
347082 5
 
0.9%
sc/paris 4
 
0.7%
f.c.c 4
 
0.7%
w./c 4
 
0.7%
Other values (406) 469
84.4%
ValueCountFrequency (%)
pc 36
 
6.3%
c.a 17
 
3.0%
ca 8
 
1.4%
a/5 8
 
1.4%
2 6
 
1.0%
ston/o 6
 
1.0%
w./c 5
 
0.9%
1601 4
 
0.7%
2343 4
 
0.7%
soton/o.q 4
 
0.7%
Other values (394) 477
83.0%
2024-05-07T19:22:37.907794image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
3 375
12.6%
1 366
12.3%
2 291
9.8%
7 267
9.0%
4 230
7.7%
0 216
7.3%
6 207
 
7.0%
5 199
 
6.7%
8 144
 
4.8%
9 140
 
4.7%
Other values (22) 539
18.1%
ValueCountFrequency (%)
3 360
12.0%
1 344
11.5%
2 288
9.6%
7 242
 
8.1%
6 223
 
7.4%
4 209
 
7.0%
5 206
 
6.9%
0 203
 
6.8%
9 173
 
5.8%
8 139
 
4.6%
Other values (22) 614
20.5%

Most occurring categories

ValueCountFrequency (%)
(unknown) 2974
100.0%
ValueCountFrequency (%)
(unknown) 3001
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
3 375
12.6%
1 366
12.3%
2 291
9.8%
7 267
9.0%
4 230
7.7%
0 216
7.3%
6 207
 
7.0%
5 199
 
6.7%
8 144
 
4.8%
9 140
 
4.7%
Other values (22) 539
18.1%
ValueCountFrequency (%)
3 360
12.0%
1 344
11.5%
2 288
9.6%
7 242
 
8.1%
6 223
 
7.4%
4 209
 
7.0%
5 206
 
6.9%
0 203
 
6.8%
9 173
 
5.8%
8 139
 
4.6%
Other values (22) 614
20.5%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 2974
100.0%
ValueCountFrequency (%)
(unknown) 3001
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
3 375
12.6%
1 366
12.3%
2 291
9.8%
7 267
9.0%
4 230
7.7%
0 216
7.3%
6 207
 
7.0%
5 199
 
6.7%
8 144
 
4.8%
9 140
 
4.7%
Other values (22) 539
18.1%
ValueCountFrequency (%)
3 360
12.0%
1 344
11.5%
2 288
9.6%
7 242
 
8.1%
6 223
 
7.4%
4 209
 
7.0%
5 206
 
6.9%
0 203
 
6.8%
9 173
 
5.8%
8 139
 
4.6%
Other values (22) 614
20.5%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 2974
100.0%
ValueCountFrequency (%)
(unknown) 3001
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
3 375
12.6%
1 366
12.3%
2 291
9.8%
7 267
9.0%
4 230
7.7%
0 216
7.3%
6 207
 
7.0%
5 199
 
6.7%
8 144
 
4.8%
9 140
 
4.7%
Other values (22) 539
18.1%
ValueCountFrequency (%)
3 360
12.0%
1 344
11.5%
2 288
9.6%
7 242
 
8.1%
6 223
 
7.4%
4 209
 
7.0%
5 206
 
6.9%
0 203
 
6.8%
9 173
 
5.8%
8 139
 
4.6%
Other values (22) 614
20.5%

Fare
Real number (ℝ)

 Dataset ADataset B
Distinct179176
Distinct (%)40.1%39.5%
Missing00
Missing (%)0.0%0.0%
Infinite00
Infinite (%)0.0%0.0%
Mean29.94484232.795645
 Dataset ADataset B
Minimum00
Maximum512.3292512.3292
Zeros59
Zeros (%)1.1%2.0%
Negative00
Negative (%)0.0%0.0%
Memory size7.0 KiB7.0 KiB
2024-05-07T19:22:38.195228image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Quantile statistics

 Dataset ADataset B
Minimum00
5-th percentile7.2257.225
Q17.9257.8958
median14.1291514.5
Q329.731.20625
95-th percentile108.28125110.8833
Maximum512.3292512.3292
Range512.3292512.3292
Interquartile range (IQR)21.77523.31045

Descriptive statistics

 Dataset ADataset B
Standard deviation47.89068449.611976
Coefficient of variation (CV)1.59929661.5127611
Kurtosis48.50728825.339136
Mean29.94484232.795645
Median Absolute Deviation (MAD)6.479156.9646
Skewness5.84521844.1630847
Sum13355.414626.858
Variance2293.51762461.3482
MonotonicityNot monotonicNot monotonic
2024-05-07T19:22:38.476599image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
13 25
 
5.6%
26 18
 
4.0%
7.75 17
 
3.8%
8.05 17
 
3.8%
7.8958 16
 
3.6%
10.5 14
 
3.1%
7.925 10
 
2.2%
7.775 8
 
1.8%
8.6625 8
 
1.8%
7.8542 8
 
1.8%
Other values (169) 305
68.4%
ValueCountFrequency (%)
13 19
 
4.3%
8.05 19
 
4.3%
7.8958 18
 
4.0%
7.75 18
 
4.0%
26 13
 
2.9%
10.5 13
 
2.9%
7.925 10
 
2.2%
0 9
 
2.0%
7.2292 9
 
2.0%
7.775 9
 
2.0%
Other values (166) 309
69.3%
ValueCountFrequency (%)
0 5
1.1%
5 1
 
0.2%
6.2375 1
 
0.2%
6.4375 1
 
0.2%
6.45 1
 
0.2%
6.4958 2
 
0.4%
6.8583 1
 
0.2%
6.95 1
 
0.2%
7.0458 1
 
0.2%
7.05 4
0.9%
ValueCountFrequency (%)
0 9
2.0%
4.0125 1
 
0.2%
6.2375 1
 
0.2%
6.4958 2
 
0.4%
6.75 1
 
0.2%
6.8583 1
 
0.2%
7.0458 1
 
0.2%
7.05 3
 
0.7%
7.0542 1
 
0.2%
7.1417 1
 
0.2%
ValueCountFrequency (%)
0 9
2.0%
4.0125 1
 
0.2%
6.2375 1
 
0.2%
6.4958 2
 
0.4%
6.75 1
 
0.2%
6.8583 1
 
0.2%
7.0458 1
 
0.2%
7.05 3
 
0.7%
7.0542 1
 
0.2%
7.1417 1
 
0.2%
ValueCountFrequency (%)
0 5
1.1%
5 1
 
0.2%
6.2375 1
 
0.2%
6.4375 1
 
0.2%
6.45 1
 
0.2%
6.4958 2
 
0.4%
6.8583 1
 
0.2%
6.95 1
 
0.2%
7.0458 1
 
0.2%
7.05 4
0.9%

Cabin
['Text', 'Text']

 Dataset ADataset B
Distinct7891
Distinct (%)82.1%83.5%
Missing351337
Missing (%)78.7%75.6%
Memory size7.0 KiB7.0 KiB
2024-05-07T19:22:38.985364image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Length

 Dataset ADataset B
Max length1511
Median length33
Mean length3.55789473.4311927
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters338374
Distinct characters1918
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique6376 ?
Unique (%)66.3%69.7%

Sample

 Dataset ADataset B
1st rowE24B5
2nd rowC50E36
3rd rowA36C93
4th rowC128B78
5th rowA24C126
ValueCountFrequency (%)
f 4
 
3.6%
b96 3
 
2.7%
c22 3
 
2.7%
c26 3
 
2.7%
b98 3
 
2.7%
e24 2
 
1.8%
e44 2
 
1.8%
e8 2
 
1.8%
g73 2
 
1.8%
c92 2
 
1.8%
Other values (77) 85
76.6%
ValueCountFrequency (%)
g6 4
 
3.3%
c25 3
 
2.4%
c27 3
 
2.4%
c23 3
 
2.4%
e24 2
 
1.6%
c126 2
 
1.6%
f4 2
 
1.6%
b60 2
 
1.6%
b58 2
 
1.6%
c26 2
 
1.6%
Other values (89) 98
79.7%
2024-05-07T19:22:39.716314image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
B 32
 
9.5%
2 31
 
9.2%
C 30
 
8.9%
1 28
 
8.3%
6 26
 
7.7%
3 24
 
7.1%
8 20
 
5.9%
7 18
 
5.3%
E 18
 
5.3%
5 18
 
5.3%
Other values (9) 93
27.5%
ValueCountFrequency (%)
2 44
11.8%
C 42
11.2%
1 33
 
8.8%
5 27
 
7.2%
3 25
 
6.7%
6 24
 
6.4%
B 24
 
6.4%
4 22
 
5.9%
E 19
 
5.1%
0 19
 
5.1%
Other values (8) 95
25.4%

Most occurring categories

ValueCountFrequency (%)
(unknown) 338
100.0%
ValueCountFrequency (%)
(unknown) 374
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
B 32
 
9.5%
2 31
 
9.2%
C 30
 
8.9%
1 28
 
8.3%
6 26
 
7.7%
3 24
 
7.1%
8 20
 
5.9%
7 18
 
5.3%
E 18
 
5.3%
5 18
 
5.3%
Other values (9) 93
27.5%
ValueCountFrequency (%)
2 44
11.8%
C 42
11.2%
1 33
 
8.8%
5 27
 
7.2%
3 25
 
6.7%
6 24
 
6.4%
B 24
 
6.4%
4 22
 
5.9%
E 19
 
5.1%
0 19
 
5.1%
Other values (8) 95
25.4%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 338
100.0%
ValueCountFrequency (%)
(unknown) 374
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
B 32
 
9.5%
2 31
 
9.2%
C 30
 
8.9%
1 28
 
8.3%
6 26
 
7.7%
3 24
 
7.1%
8 20
 
5.9%
7 18
 
5.3%
E 18
 
5.3%
5 18
 
5.3%
Other values (9) 93
27.5%
ValueCountFrequency (%)
2 44
11.8%
C 42
11.2%
1 33
 
8.8%
5 27
 
7.2%
3 25
 
6.7%
6 24
 
6.4%
B 24
 
6.4%
4 22
 
5.9%
E 19
 
5.1%
0 19
 
5.1%
Other values (8) 95
25.4%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 338
100.0%
ValueCountFrequency (%)
(unknown) 374
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
B 32
 
9.5%
2 31
 
9.2%
C 30
 
8.9%
1 28
 
8.3%
6 26
 
7.7%
3 24
 
7.1%
8 20
 
5.9%
7 18
 
5.3%
E 18
 
5.3%
5 18
 
5.3%
Other values (9) 93
27.5%
ValueCountFrequency (%)
2 44
11.8%
C 42
11.2%
1 33
 
8.8%
5 27
 
7.2%
3 25
 
6.7%
6 24
 
6.4%
B 24
 
6.4%
4 22
 
5.9%
E 19
 
5.1%
0 19
 
5.1%
Other values (8) 95
25.4%

Embarked
Categorical

 Dataset ADataset B
Distinct33
Distinct (%)0.7%0.7%
Missing01
Missing (%)0.0%0.2%
Memory size7.0 KiB7.0 KiB
S
328 
C
77 
Q
41 
S
315 
C
89 
Q
41 

Length

 Dataset ADataset B
Max length11
Median length11
Mean length11
Min length11

Characters and Unicode

 Dataset ADataset B
Total characters446445
Distinct characters33
Distinct categories11 ?
Distinct scripts11 ?
Distinct blocks11 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

 Dataset ADataset B
Unique00 ?
Unique (%)0.0%0.0%

Sample

 Dataset ADataset B
1st rowSS
2nd rowSS
3rd rowCS
4th rowCS
5th rowSS

Common Values

ValueCountFrequency (%)
S 328
73.5%
C 77
 
17.3%
Q 41
 
9.2%
ValueCountFrequency (%)
S 315
70.6%
C 89
 
20.0%
Q 41
 
9.2%
(Missing) 1
 
0.2%

Length

2024-05-07T19:22:39.933367image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

Dataset A

2024-05-07T19:22:40.055315image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T19:22:40.166320image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
ValueCountFrequency (%)
s 328
73.5%
c 77
 
17.3%
q 41
 
9.2%
ValueCountFrequency (%)
s 315
70.8%
c 89
 
20.0%
q 41
 
9.2%

Most occurring characters

ValueCountFrequency (%)
S 328
73.5%
C 77
 
17.3%
Q 41
 
9.2%
ValueCountFrequency (%)
S 315
70.8%
C 89
 
20.0%
Q 41
 
9.2%

Most occurring categories

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 445
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
S 328
73.5%
C 77
 
17.3%
Q 41
 
9.2%
ValueCountFrequency (%)
S 315
70.8%
C 89
 
20.0%
Q 41
 
9.2%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 445
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
S 328
73.5%
C 77
 
17.3%
Q 41
 
9.2%
ValueCountFrequency (%)
S 315
70.8%
C 89
 
20.0%
Q 41
 
9.2%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 446
100.0%
ValueCountFrequency (%)
(unknown) 445
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
S 328
73.5%
C 77
 
17.3%
Q 41
 
9.2%
ValueCountFrequency (%)
S 315
70.8%
C 89
 
20.0%
Q 41
 
9.2%

Interactions

Dataset A

2024-05-07T19:22:26.623613image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T19:22:30.614334image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset A

2024-05-07T19:22:24.067822image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T19:22:27.920284image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset A

2024-05-07T19:22:24.676575image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T19:22:28.538870image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset A

2024-05-07T19:22:25.303092image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T19:22:29.309818image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset A

2024-05-07T19:22:25.996070image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T19:22:29.980736image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset A

2024-05-07T19:22:26.738339image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T19:22:30.733110image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset A

2024-05-07T19:22:24.185399image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T19:22:28.031196image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset A

2024-05-07T19:22:24.796912image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T19:22:28.660164image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset A

2024-05-07T19:22:25.416480image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T19:22:29.435930image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset A

2024-05-07T19:22:26.114783image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T19:22:30.100610image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset A

2024-05-07T19:22:26.867099image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T19:22:30.862247image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset A

2024-05-07T19:22:24.314992image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T19:22:28.161543image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset A

2024-05-07T19:22:24.927325image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T19:22:28.791570image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset A

2024-05-07T19:22:25.638145image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T19:22:29.568291image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset A

2024-05-07T19:22:26.248271image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T19:22:30.231200image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset A

2024-05-07T19:22:26.986854image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T19:22:30.999529image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset A

2024-05-07T19:22:24.433213image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T19:22:28.294595image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset A

2024-05-07T19:22:25.051922image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T19:22:28.917235image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset A

2024-05-07T19:22:25.755078image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T19:22:29.712361image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset A

2024-05-07T19:22:26.370759image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T19:22:30.367315image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset A

2024-05-07T19:22:27.112638image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T19:22:31.124765image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset A

2024-05-07T19:22:24.558975image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T19:22:28.421031image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset A

2024-05-07T19:22:25.178874image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T19:22:29.046003image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset A

2024-05-07T19:22:25.878021image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T19:22:29.849369image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset A

2024-05-07T19:22:26.497748image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Dataset B

2024-05-07T19:22:30.493142image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/

Missing values

Dataset A

2024-05-07T19:22:27.292628image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
A simple visualization of nullity by column.

Dataset B

2024-05-07T19:22:31.304706image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
A simple visualization of nullity by column.

Dataset A

2024-05-07T19:22:27.553881image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Dataset B

2024-05-07T19:22:31.565851image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Dataset A

2024-05-07T19:22:27.720113image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Dataset B

2024-05-07T19:22:31.733040image/svg+xmlMatplotlib v3.8.4, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

Dataset A

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
70170211Silverthorne, Mr. Spencer Victormale35.000PC 1747526.2875E24S
54254303Andersson, Miss. Sigrid Elisabethfemale11.04234708231.2750NaNS
38939012Lehmann, Miss. Berthafemale17.000SC 174812.0000NaNC
87988011Potter, Mrs. Thomas Jr (Lily Alexenia Wilson)female56.0011176783.1583C50C
181903Vander Planke, Mrs. Julius (Emelia Maria Vandemoortele)female31.01034576318.0000NaNS
21221303Perkin, Mr. John Henrymale22.000A/5 211747.2500NaNS
86486502Gill, Mr. John Williammale24.00023386613.0000NaNS
80680701Andrews, Mr. Thomas Jrmale39.0001120500.0000A36S
75275303Vande Velde, Mr. Johannes Josephmale33.0003457809.5000NaNS
42542603Wiseman, Mr. PhillippemaleNaN00A/4. 342447.2500NaNS

Dataset B

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
68969011Madill, Miss. Georgette Alexandrafemale15.00124160211.3375B5S
7803Palsson, Master. Gosta Leonardmale2.03134990921.0750NaNS
34834913Coutts, Master. William Loch "William"male3.011C.A. 3767115.9000NaNS
73873903Ivanoff, Mr. KaniomaleNaN003492017.8958NaNS
82182213Lulic, Mr. Nikolamale27.0003150988.6625NaNS
62062103Yasbeck, Mr. Antonimale27.010265914.4542NaNC
30931011Francatelli, Miss. Laura Mabelfemale30.000PC 1748556.9292E36C
79479503Dantcheff, Mr. Ristiumale25.0003492037.8958NaNS
474813O'Driscoll, Miss. BridgetfemaleNaN00143117.7500NaNQ
22422511Hoyt, Mr. Frederick Maxfieldmale38.0101994390.0000C93S

Dataset A

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
71471502Greenberg, Mr. Samuelmale52.00025064713.0000NaNS
16616711Chibnall, Mrs. (Edith Martha Bowerman)femaleNaN0111350555.0000E33S
11611703Connors, Mr. Patrickmale70.5003703697.7500NaNQ
16716803Skoog, Mrs. William (Anna Bernhardina Karlsson)female45.01434708827.9000NaNS
51751803Ryan, Mr. PatrickmaleNaN0037111024.1500NaNQ
19319412Navratil, Master. Michel Mmale3.01123008026.0000F2S
28528603Stankovic, Mr. Ivanmale33.0003492398.6625NaNC
87387403Vander Cruyssen, Mr. Victormale47.0003457659.0000NaNS
86886903van Melkebeke, Mr. PhilemonmaleNaN003457779.5000NaNS
26526602Reeves, Mr. Davidmale36.000C.A. 1724810.5000NaNS

Dataset B

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
545501Ostby, Mr. Engelhart Corneliusmale65.000111350961.9792B30C
46846903Scanlan, Mr. JamesmaleNaN00362097.7250NaNQ
424303Kraeff, Mr. TheodormaleNaN003492537.8958NaNC
73373402Berriman, Mr. William Johnmale23.00002842513.0000NaNS
63763802Collyer, Mr. Harveymale31.0011C.A. 3192126.2500NaNS
48648711Hoyt, Mrs. Frederick Maxfield (Jane Anne Forby)female35.00101994390.0000C93S
54054111Crosby, Miss. Harriet Rfemale36.0002WE/P 573571.0000B22S
75675703Carlsson, Mr. August Sigfridmale28.00003500427.7958NaNS
35435503Yousif, Mr. WazlimaleNaN0026477.2250NaNC
75575612Hamalainen, Master. Viljomale0.671125064914.5000NaNS

Duplicate rows

Dataset A

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked# duplicates
Dataset does not contain duplicate rows.

Dataset B

PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked# duplicates
Dataset does not contain duplicate rows.